Skip to content

Fix 7180 autodetect #7691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

bjonen
Copy link
Contributor

@bjonen bjonen commented Jul 8, 2014

This PR closes #7180

In the terminal:
If maxcols == 0 then auto detect columns
if maxrows == 0 auto detect rows

@jreback jreback added this to the 0.15.0 milestone Jul 8, 2014
@bjonen
Copy link
Contributor Author

bjonen commented Jul 15, 2014

Are you ok with updating the docs as follows? @jorisvandenbossche @jreback

display.max_columns
If max_cols is exceeded, switch to truncate view. 
Depending on `large_repr` objects are either 
centrally truncated or printed as a summary view. 

In case python/IPython is running in a terminal and 
`large_repr` == truncate this can be set to 0 and pandas will 
correctly auto-detect the width of the terminal and print a 
truncated object which fits the screen width. 
The IPython notebook, IPython qtconsole, 
or IDLE do not run in a terminal and hence it is 
not possible to do correct auto-detection. 
'None' value means unlimited.
display.max_rows:
See display.max_columns

@jreback
Copy link
Contributor

jreback commented Jul 15, 2014

is 0 different than None?

@bjonen
Copy link
Contributor Author

bjonen commented Jul 16, 2014

I think it used to be different. Actually I have to check whether the current implementation respects that. You think it's confusing to have 0 and None?

@jreback
Copy link
Contributor

jreback commented Jul 16, 2014

ok it appears that both max_rows/max_columns accept None. can you see:

  • have test for 0 and None yield the same (and if they don't, pls post). This AFAICR should not be the case, but needs checking

@armaganthis3
Copy link

This was meant to fix a regression, where width auto-detection no longer works.
@jreback , you're asking #bjonen to change established behaviour (None means unlimited).
The only reason seems like your own aesthetic preference that 0 and None should be synonymous. they are not. Not in type and not in established meaning.

Why constantly introduce unnecessary changes to existing behavior? there's no reason.
You didn't even read the docstring @bjonen was asking about.

xref the discussion in #7750

cc @jorisvandenbossche

@jreback
Copy link
Contributor

jreback commented Jul 16, 2014

@armaganthis3 I guess you are not reading my comments.

I asked @bjonen to confirm what the functionaility is. I don't really recall what its supposed to be.

Of course 0 and None should, all other things being equal, have the same meaning. Unless their is REALLY REALLY good reason.

Why don't you do some research and post it here. That would actually be helpful.

@armaganthis3
Copy link

have test for 0 and None yield the same (and if they don't, pls post). This AFAICR should not be the case, but needs checking

You asked him to add a test to ensure they behave the same, and you did not read the docstring
he posted above, which explicitly states they are different. an example of carelessness.

Of course 0 and None should, all other things being equal, have the same meaning.

Utter nonsense. You are making stuff up and then proclaiming them as rules. It's a minor point,
but it's another example of so many others in #7750 that it's become disturbing.

@bjonen
Copy link
Contributor Author

bjonen commented Jul 20, 2014

Ok, both before and in the current PR, None is unlimited (no truncation, wrap around instead).

@jorisvandenbossche
Copy link
Member

@bjonen Your updated description in the comment above looks OK. Can you update the PR for this?

A question: what means 'autodetect' in the case of rows?

@bjonen
Copy link
Contributor Author

bjonen commented Jul 29, 2014

Sure I can do that.

Autodetect for rows:
We can get the terminal height which corresponds to the number of rows that can be displayed.

(w,h) = get_terminal_size()

There are two parameters:

row_fac = 0.85  # Fraction of row space to fill on row_num == 0
col_fac = 0.90  # Fraction of screen width to fill on col_num == 0

@jorisvandenbossche
Copy link
Member

Thanks for the clarification!

@bjonen bjonen force-pushed the fix_7180_autodetect branch 2 times, most recently from 8003764 to 2537c4d Compare September 6, 2014 16:38
@bjonen
Copy link
Contributor Author

bjonen commented Sep 6, 2014

Added docs and rebased. Ready from my side.

@jreback
Copy link
Contributor

jreback commented Sep 8, 2014

looks fine, @jorisvandenbossche ?

@jreback
Copy link
Contributor

jreback commented Sep 8, 2014

@bjonen actually, can you add a release note for this in v0.15.0. Is a bigger notice needed? (its sort fo an edge case, thinking no)

for a dataframe prints out fully or just a summary repr.
'None' value means unlimited.
If max_rows is exceeded, switch to truncate view. View
display.max_columns for details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave 'real' explanation of what max_rows does also here, instead of referring to max_cols (I think it is annoying if you do pd.describe_option('max_rows) and you don't get the answer).

@jorisvandenbossche
Copy link
Member

@bjonen I added still some comments on the docstring (sorry for not doing this before you updated and rebased)

I think this is a bug fix? (and so should go in that section?)

@jorisvandenbossche
Copy link
Member

Something else, I tested it, and it doesn't seem to work correctly for me (tested it on Windows)

On the standard size (width of 80), it gives one column too much (and overflows). If I resize the terminal, then it works (eg with width of 75).
Also, if I use this option, there seems to be a lag of almost a second for displaying a DataFrame:

In [20]: df = pd.DataFrame(np.random.randn(100,1000))
In [21]: pd.options.display.max_rows = 10
In [22]: pd.options.display.max_columns = 10
In [23]: %timeit df.__repr__()
100 loops, best of 3: 12 ms per loop

In [24]: pd.options.display.max_rows = 0
In [25]: pd.options.display.max_columns = 0
In [26]: %timeit df.__repr__()
1 loops, best of 3: 3.15 s per loop

And get_terminal_size is not the culprit:

In [28]: %timeit get_terminal_size()
10000 loops, best of 3: 28.7 µs per loop

@bjonen
Copy link
Contributor Author

bjonen commented Sep 8, 2014

@jorisvandenbossche Thanks for looking at this PR.

  1. Overflow issue: I should be subtracting absolute values from the width/height instead of a percentage value. The row count at the bottom, for example, takes the same number of rows independent of the w/h. I fixed it for rows and it works when I resize the terminal on Ubuntu. For columns, there is indeed a problem. I have to investigate this further.

  2. Speed issue: I tested this with (100,100) frames and the speed is ok. This indeed gets very slow for very large dfs. The reason is that I generate the string for the whole frame. Then I know the size of the elements and can cut as many columns as needed to fit to the desired width. I went this route because it was fast to implement.
    Alternatively one could add columns left and right until the screen is filled. This way one only has to format columns that are ultimately printed.

@bjonen
Copy link
Contributor Author

bjonen commented Sep 9, 2014

If you feel the speed issue is a deal breaker, I can attempt an alternative implementation. But this will take some time.

If you want to get this into the next release, I could take care of the overflow issue and then mention that this display gets slow for dfs greater than (100,100).

@jreback
Copy link
Contributor

jreback commented Sep 9, 2014

@bjonen you can check up front whether the size is an issue (e.g. > 100x100)?
if so, then just take some bigger size (say 200x200) and do your calcs (where you are reducing rows and such). then don't ever have to deal with a really big size. (slight downside is that in theory in may not fill up the space, but you can make a good guess I think)

@bjonen bjonen force-pushed the fix_7180_autodetect branch from 3fec539 to 7d027e0 Compare September 10, 2014 00:09
@bjonen
Copy link
Contributor Author

bjonen commented Sep 10, 2014

@jreback Following your suggestion the speed improves quite a bit. auto-detection is still factor 10 slower though:

Without:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(1000,1000))
In [4]: %timeit df.repr()
10 loops, best of 3: 40.3 ms per loop

And with auto detection:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(1000,1000))
In [4]: pd.options.display.max_columns = 0
In [5]: pd.options.display.max_rows = 0
In [6]: %timeit df.repr()
1 loops, best of 3: 343 ms per loop

I fixed the overflow issue on my machine. It would be great if someone could check it (especially on windows). Also updated the docs and rebased. Thanks for the help guys!

@jreback
Copy link
Contributor

jreback commented Sep 10, 2014

this is only for max_columns=o right? (eg auto detection is on)

@bjonen
Copy link
Contributor Author

bjonen commented Sep 10, 2014

Yes, continuing the same session:

In [7]: pd.options.display.max_rows = 20
In [8]: pd.options.display.max_columns = 20
In [9]: %timeit df.repr()
10 loops, best of 3: 19.9 ms per loop

@bjonen
Copy link
Contributor Author

bjonen commented Sep 10, 2014

However, just found these two cases are broken:

max_cols == 0 and max_rows != 0
max_cols != 0 and max_rows == 0

@bjonen bjonen force-pushed the fix_7180_autodetect branch from 7d027e0 to 1f67279 Compare September 11, 2014 19:56
@bjonen
Copy link
Contributor Author

bjonen commented Sep 11, 2014

Fixed remaining issues and added line in release note.

@bjonen
Copy link
Contributor Author

bjonen commented Sep 11, 2014

Checked in the notebook and the truncation seems unaffected by the changes.

@jreback @jorisvandenbossche
When you have time, could you please test the auto-resizing on your machines? On windows you might have to set widht==0.

@jorisvandenbossche
Copy link
Member

@bjonen what do you mean with "have to set width==0"?

@bjonen
Copy link
Contributor Author

bjonen commented Sep 11, 2014

Sorry, that was not very clear. I meant pd.options.display.width. I set this to None in 0.14 to infer the size of the terminal instead of the default 80.

However, I checked briefly and It shouldn't be necessary to adjust this value to get the df displayed across the entire terminal when max_columns = 0.

@jorisvandenbossche
Copy link
Member

@bjonen I can confirm that it now works as expected on Windows

It is also quicker, although you notice it is a bit slower than normal (it is not 'irritating' anymore, but if it would be easily possible to make it as fast as the normal display, that would be a plus I think, but maybe not for this PR)

@jorisvandenbossche
Copy link
Member

But I noticed something else in the printing, not necessarily related to this (it occurs both in max_columns=0 case as default parameter).
So the columns names in the 'left' side from the truncation are misaligned by one space:

In [33]: df = pd.DataFrame(np.random.randn(10,30))

In [34]: df
Out[34]:
         0         1         2         3         4         5         6   \
0 -0.408156  0.416667  0.627905  0.393364 -1.662219 -0.601590 -0.553563
1 -0.447229 -2.427161 -0.663929  0.634625 -0.765336  0.193455  0.326905
2  1.412678 -0.209811 -0.692369 -0.593017 -1.090450  1.132212  0.665779
3  0.748226  0.565255  0.500697 -1.666073  0.378457  0.701584 -1.593952
4 -0.156511  1.810054 -0.566796  0.763574 -1.315514 -0.250945 -0.672395
5  0.857299 -0.153263 -1.718789  0.930296  0.096889  0.262306 -0.419779
6 -1.547626 -0.886810 -0.488989  1.136803 -0.988654 -0.069571 -0.010908
7 -0.396801 -1.282552  1.528441  1.119888  0.374016  0.072011 -0.496032
8  0.466879 -1.849309 -1.668141  0.260994  0.195670  0.983986 -0.659583
9  0.271477  1.455909 -0.949029  1.563097 -0.529267 -0.445536  0.465593

         7         8         9     ...           20        21        22  \
0 -0.027451 -1.492445  0.236370    ...    -0.207124  0.942174  0.993658
1  0.395174 -0.168197 -1.092535    ...     0.870743  0.466931 -0.650112
2  0.012762 -0.145582 -0.857474    ...     1.518490 -1.223810 -0.117409
3 -1.651538  0.546825  0.016696    ...     0.656450  1.289500  0.513666
4 -0.413658  0.923769  0.583012    ...     1.298192 -1.038987  0.738511
5 -0.784654  0.129776 -0.911785    ...    -1.032633  0.235149  0.374142
6  0.917978  1.120702  0.723245    ...    -0.915931 -1.003466  0.432385
7 -0.257428 -0.460610  2.318658    ...     0.430248 -2.039238  1.090794
8  0.438255  0.060507  0.290638    ...    -1.294980 -0.695728  0.078912
9  0.592629  0.159337  1.215215    ...     0.991044  0.330678  0.536602

         23        24        25        26        27        28        29
0  0.040387  0.478344  0.754135 -0.025597  1.516723  0.585786  1.367364
1  0.535671 -0.532744 -0.010694 -0.323556  0.424386 -0.916463  0.309135
2  0.032086  0.359642  0.150719  0.482848 -0.115011  1.714101 -0.604355
3  3.558434 -1.161022  0.973159  0.979117 -0.194935 -0.162460  0.199642
4 -0.193967  0.400109 -0.047031  0.181616  0.533648  0.040881  1.456876
5  1.765558  1.216659  0.506202  0.252224  0.907562 -0.645786 -0.097129
6  1.529280 -0.656682 -0.577339 -0.668127 -0.031473  1.649144 -0.466511
7 -0.137917 -0.228749  0.211181 -0.378891 -1.412424 -1.246940  1.351121
8  0.840317  0.993658  0.627771  1.125502  0.051515 -0.050630  0.474245
9  0.640976  0.954312  2.334820 -0.014047  0.472212 -0.684699 -0.994627

[10 rows x 30 columns]

Update: opened a seperate issue about this, also occurs on master.

@bjonen
Copy link
Contributor Author

bjonen commented Sep 17, 2014

@jorisvandenbossche Thanks for testing the PR! I'm glad to hear it's working on Windows as well.

@bjonen
Copy link
Contributor Author

bjonen commented Sep 18, 2014

Regarding the speed issue: Right now, I don't see a quick fix. So I would do this in a separate PR.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

looks fine to me.just needs a rebase.

This is simply a bug fix, so looks ok, unless @jorisvandenbossche thinks should be more prominent in release notes.

@jorisvandenbossche
Copy link
Member

no, looks ok

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

merged via ccb196f

thanks @bjonen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

max_columns == 0 incorrectly wraps around for wide dfs
4 participants